Ensemble Methods Based on Bias–variance Analysis Title: Ensemble Methods Based on Bias–variance Analysis

نویسندگان

  • Giorgio Valentini
  • Francesco Masulli
چکیده

Ensembles of classifiers represent one of the main research directions in machine learning. Two main theories are invoked to explain the success of ensemble methods. The first one consider the ensembles in the framework of large margin classifiers, showing that ensembles enlarge the margins, enhancing the generalization capabilities of learning algorithms. The second is based on the classical bias–variance decomposition of the error, and it shows that ensembles can reduce variance and/or bias. In accordance with this second approach, this thesis pursues a twofold purpose: on the one hand it explores the possibility of using bias–variance decomposition of the error as an analytical tool to study the properties of learning algorithms; on the other hand it explores the possibility of designing ensemble methods based on bias–variance analysis of the error. At first, bias–variance decomposition of the error is considered as a tool to analyze learning algorithms. This work shows how to apply Domingos and James theories on bias–variance decomposition of the error to the analysis of learning algorithms. Extended experiments with Support Vector Machines (SVMs) are presented, and the analysis of the relationships between bias, variance, kernel type and its parameters provides a characterization of the error decomposition, offering insights into the way SVMs learn. In a similar way bias–variance analysis is applied as a tool to explain the properties of ensembles of learners. A bias–variance analysis of ensembles based on resampling techniques is conducted, showing that, as expected, bagging is a variance reduction ensemble method, while the theoretical property of canceled variance holds only for Breiman’s random aggregated predictors. In addition to analyzing learning algorithms, bias–variance analysis can offer guidance to the design of ensemble methods. This work shows that it provides a theoretical and practical tool to develop new ensemble methods well-tuned to the characteristics of a specific base learner. On the basis of the analysis and experiments performed on SVMs and bagged ensembles of SVMs, new ensemble methods based on bias–variance analysis are proposed. In particular Lobag (Low bias bagging ) selects low bias base learners and then combines them through bootstrap aggregating techniques. This approach affects both bias, through the selection of low bias base learners, and variance, through bootstrap aggregation of the selected low bias base learners. Moreover a new potential class of ensemble methods (heterogeneous ensembles of SVMs), that aggregate different SVM models on the basis of their bias– variance characteristics, is introduced. From an applicative standpoint it is also shown that the proposed ensemble methods can be successfully applied to the analysis of DNA microarray data.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Bias-Variance Analysis of Support Vector Machines for the Development of SVM-Based Ensemble Methods

Bias-variance analysis provides a tool to study learning algorithms and can be used to properly design ensemble methods well tuned to the properties of a specific base learner. Indeed the effectiveness of ensemble methods critically depends on accuracy, diversity and learning characteristics of base learners. We present an extended experimental analysis of bias-variance decomposition of the err...

متن کامل

Improved customer choice predictions using ensemble methods

In this paper various ensemble learning methods from machine learning and statistics are considered and applied to the customer choice modeling problem. The application of ensemble learning usually improves the prediction quality of flexible models like decision trees and thus leads to improved predictions. We give experimental results for two real-life marketing datasets using decision trees, ...

متن کامل

Ensemble Classification for Relational Domains

Ensemble classification methods have been shown to produce more accurate predictions than the base component models (Bauer and Kohavi 1999). Due to their effectiveness, ensemble approaches have been applied in a wide range of domains to improve classification. The expected prediction error of classification models can be decomposed into bias and variance (Friedman 1997). Ensemble methods that i...

متن کامل

Online tree-based ensembles and option trees for regression on evolving data streams

The emergence of ubiquitous sources of streaming data has given rise to the popularity of algorithms for online machine learning. In that context, Hoeffding trees represent the state-of-the-art algorithms for online classification. Their popularity stems in large part from their ability to process large quantities of data with a speed that goes beyond the processing power of any other streaming...

متن کامل

Bias and Variance of Rotation-Based Ensembles

In Machine Learning, ensembles are combination of classifiers. Their objective is to improve the accuracy. In previous works, we have presented a method for the generation of ensembles, named rotation-based. It transforms the training data set; it groups, randomly, the attributes in different subgroups, and applies, for each group, an axis rotation. If the used method for the induction of the c...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003